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Abstract —Multi-tier networks with large-array base sta¬ 
tions (BSs) that are able to operate in the “massive MIMO” 
regime are envisioned to play a key role in meeting the 
exploding wireless traffic demands. Operated over small cells 
with reciprocity-based training, massive MIMO promises 
large spectral efficiencies per unit area with low overheads. 
Also, near-optimal user-BS association and resource alloca¬ 
tion are possible in cellular massive MIMO HetNets using 
simple admission control mechanisms and rudimentary BS 
schedulers, since scheduled user rates can be predicted a 
priori with massive MIMO. 

Reciprocity-based training naturally enables coordinated 
multi-point transmission (CoMP), as each uplink pilot in¬ 
herently trains antenna arrays at all nearby BSs. In this 
paper we consider a distributed-MIMO form of CoMP, which 
improves cell-edge performance without requiring channel 
state information exchanges among cooperating BSs. We 
present methods for harmonized operation of distributed 
and cellular massive MIMO in the downlink that opti¬ 
mize resource allocation at a coarser time scale across 
the network. We also present scheduling policies at the 
resource block level which target approaching the optimal 
allocations. Simulations reveal that the proposed methods can 
significantly outperform the network-optimized cellular-only 
massive MIMO operation (l.e., operation without CoMP), 
especially at the cell edge. 

1. Introduction 

The exponential growth in wireless traffic is driving the 
densification of cellular networks. Existing networks of 
carefully planned conventional macro base stations (BSs) 
are becoming transformed into dense irregular heteroge¬ 
neous networks (HetNets), as they are continuously sup¬ 
plemented with various types of BSs, differing in transmit 
power, physical size, and deployment cost [1]. 

It has been well recognized that traditional user-BS 
association schemes are highly suboptimal for HetNets, 
due to the large disparities in BS transmit power [2]. 
Moreover, the non-uniform user distribution and irregular 
deployment of small BSs make load balancing critical. 
Various approaches have been used to investigate load 
balancing in HetNets [2], including stochastic geometry 
approach [3] and techniques from game theory [4]. Some 
standardization efforts have also been made for load bal¬ 
ancing in HetNets, e.g., in the form of cell range expansion 

[5]. 

Several recent works [6-9] recast load balancing in Het¬ 
Nets as a network utility maximization (NUM) problem. 


Paper [6] studies the optimal user-BS association problem 
in HetNets and shows a great improvement in user rate 
distribution via systematic load balancing. Papers [7,8] 
consider the joint optimization of user association and 
BS muting - referred to in 3GPP as enhanced intercell 
interference coordination (elCIC). 

In parallel, there is surging interest in equipping BSs 
with large antenna arrays. With higher-frequency spectrum 
becoming available, large arrays become feasible even for 
small cells, as more effective antennas can be packed into 
a small form factor'. By exploiting channel reciprocity, 
massive arrays can be trained in the uplink (whether for 
uplink or downlink transmission) with low overheads [10]. 
This enables very large spectral efficiencies per unit area 
via massive MIMO, i.e., via serving simultaneously many 
users (although much fewer than antennas), each at a very 
high rate [10-13]. Attributes of massive MIMO can also 
be exploited to achieve near-optimal load balancing over 
massive MIMO HetNets using simple user-BS association 
methods with cellular transmission (where data for each 
user is transmitted from a single BS) [9]. 

In this paper we consider the use of coordinated multi¬ 
point transmission (CoMP) as a means for improving 
network performance, in particular, the cell-edge perfor¬ 
mance. CoMP is naturally enabled by reciprocity-based 
training, since a single uplink pilot from a user terminal 
can train all nearby antennas. In regular cellular layouts 
with massive MIMO BSs, [11] shows gains to cell-edge 
users via CoMP. 

In this paper we focus on a distributed-MIMO form of 
CoMP, which does not require channel state information 
(CSI) exchanges among cooperating BSs and allows us to 
develop a systematic approach for allocating resources for 
CoMP and cellular transmission. The methods we develop 
are based on formulating a NUM problem with respect to 
user association and resource allocation via extensions of 
the framework for cellular transmission developed in [9]. 
We also present scheduling policies at the resource block 
(RB) level, which target approaching these optimized 
(coarser time scale) resource allocations. Simulations show 
that the proposed harmonized CoMP/cellular operation 
can provide significant gains with respect to cellular-only 

’For example, at 5GHz, 49 antennas (arranged on square grid at half 
wavelength spacing) can be packed on a 20cm X 20cm antenna patch. 
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massive MIMO operation [9], especially at the cell edge. 

II. System Model 

We consider the downlink of a cellular network com¬ 
prised of J BSs and K single-antenna users. We use 
j e J = and k G U = {1,2,..., itT} 

to index BSs and users, respectively. We let Mj denote 
the number of antennas at BS j and assume Mj ^ 1. 
We assume time division duplex (TDD) operation with 
reciprocity-based CSI acquisition [10,11]. Hence, every 
BS antenna in the vicinity of user k can estimate its 
downlink channel coefficient to user k from the uplink 
pilot transmitted by user k. This enables the training of 
large antenna arrays (e.g., Mj ^ 1) with pilot overheads 
proportional to number of simultaneously served users 
[10]. In contrast to feedback-based CSI acquisition, it also 
allows a user terminal to train multiple nearby BSs, and 
enables CoMP without additional training overheads. 

Transmission resources are split into slots or RBs, with 
each RB corresponding to a contiguous block of OFDM 
subcarriers and symbols. In any given RB, we let Pj 
denote the transmit power at BS j, and assume that this 
power is equally split among all served user streams. We 
assume a block-fading channel model where the user-BS 
channels remain constant within each RB [10-12,14]. We 
let gfcj = denote the Mj x 1 channel vector 

between BS j and user fc on a generic RB, with the slow- 
fading scalar jS^j characterizing the combined effect of 
distance-based path loss and location-based shadowing, 
and the vector h.kj capturing fast fading. We assume that 
the vectors h^j’s are independent in k and j, and that 
hfcj has i.i.d. CAf{0, 1) elements (independent Rayleigh 
fading). We also assume that the thermal noise process at 
user k is i.i.d. with CAf{0,a^) samples. 

III. MIMO Transmission 

Within each RB, a subset of users are active, i.e., are 
scheduled for transmission. The coded data for any given 
scheduled user can be transmitted either from a single 
BS via cellular transmission, or from multiple BSs via a 
CoMP mode referred to as distributed MIMO transmission. 

A. Prior Art: Cellular Massive MIMO [9] 

In setting the stage for the distributed MIMO operation 
presented in this work, it is worth revisiting load balancing 
and scheduling for cellular massive MIMO, as considered 
in [9]. Let Sj denote the maximum number of users served 
by BS j on any given RB, with Sj ^ Mj. Under mild 
assumptions on fading, the achievable user instantaneous 
rates on RB t, rkj{t), can be predicted a priori in 
the massive MIMO regime [9]. In particular, there exist 
deterministic quantities {vkj} such that rkj{t) ruj, 
for a\\ k G U and j G J, as Mj,Sj —t oo, with fixed 
^2 = ^j/^j — 0 [10-12]. This convergence is very fast 
with respect to the Mj’s. 

Letting Sj{t) denote the set of users served by BS 
j on RB t and Xkj = limT-j-oo (jgj^ote the 


activity fraction of user k on BS j, the long-term averaged 
throughput of user k can be obtained via [9] 

Rk — ^ ^ tCkjTkj, G hi. (1) 

The advantages of the approach in [9] for cellular 
massive MIMO operation can be summarized as follows: 

(A) The Tkj’s are accurate peak-rate proxies, which are 
independent of scheduled instances and user sets. 

(B) User throughputs depend on activity fractions, via (1). 

(C) The (combinatorial) user-cell association problem is 
recast as a (convex) NUM problem with respect to 
the {xkj} variables, subject to resource constraints. 

(D) Any {xkj} set not violating any resource constraints 
can be realized by a suitably designed scheduler. 

B. CoMP via Distributed MIMO 

The distributed MIMO scheme we consider corresponds 
to a form of CoMP that allows harvesting performance 
gains at the cell edge, with low operational overheads. 

Definition 1. Admissible Distributed MIMO Schemes: 

An admissible distributed MIMO scheme is a scheme that 
schedules transmissions for users on a sequence of RBs 
and, on each RB, satisfies the following: 

(i) All the users served by a given BS j are served in 
clusters of the same size L for some L > 1. 

(ii) BS j serves at most Sj{L) users, for some fixed 
Sj(L), satisfying Sj < Sj{L) < LSj. 

(Hi) The user beams (i.e., the precoding vectors) at BS 
j are designed as if BS j were engaging in cellular 
MU-MIMO transmission over all the users it serves. 

(iv) All BSs serving a user transmit the same coded user 
stream. Each BS transmits the stream on a beam that 
is (independently) designed for the users at that BS. 

(v) Mj ^ Sj(L), for all L and j considered. 

We also assume that, within each RB, the transmit 
power at each BS is equally split among scheduled users. 

Table I provides an example of a scheme complying 
with Defn. 1, assuming BS clusters of size 1 (cellular 
transmission) and 2. Four BSs are considered with Pj = 1, 
Sj(l) = Sj = 2, and Sj{2) = 3. As the table reveals, each 
BS on RB #1 engages in cellular transmission. On RB #2, 
BSs pairs jointly serve triplets of users. RBs #3 and #4 
provide additional, more interesting, modes. No two users 
are served by the same BS cluster on RB #3, while on RB 
#4 BSs 1-2 jointly serve a triplet of users, while BSs 3-4 
serve users in cellular transmission. Note also that (at least) 
8, 6, 6 and 7 uplink pilot dimensions are needed to enable 
RBs #1, #2, #3 and #4, respectively. Evidently, the choice 
of scheduled user sizes, Sj{L), signifies how aggressively 
pilot dimensions are reused across the network. 

It is worth making a few remarks regarding the choice 
of the distributed MIMO schemes of Defn. 1. First, the 
schemes of Defn. 1 provide the following CoMP benefits: 
(i) Performance gains at the cell edge: The beam¬ 
forming (BF) gain provided by the cellular scheme 
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TABLE I 

Example of RBs enabled by distributed MIMO over 4 BSs. 


RB 


BS 1 

BS 2 

BS 3 

BS 4 


Cluster Size 

1 

1 

1 

1 

#1 

User Power 

1/2 

1/2 

1/2 

1/2 


Served Users 

1,2 

3,4 

5,6 

7,8 


Cluster Size 

2 

2 

2 

2 

#2 

User Power 

1/3 

1/3 

1/3 

1/3 


Served Users 

1,2,3 

1,2,3 

4,5,6 

4,5,6 


Cluster Size 

2 

2 

2 

2 

#3 

User Power 

1/3 

1/3 

1/3 

1/3 


Served Users 

1,2,3 

1,4,5 

2,4,6 

3,5,6 


Cluster Size 

2 

2 

1 

1 

#4 

User Power 

1/3 

1/3 

1/2 

1/2 


Served Users 

1,2,3 

1,2,3 

4,5 

6,7 


(that the distributed scheme is based upon) becomes 
intra-cluster BF gain in the distributed MIMO case, 
as the same coded data is transmitted from all BSs 
serving the user. Similarly, the intra-cell interference 
mitigation capabilities of the cellular scheme are 
extended across the cluster of BSs from which the 
user is served. As a result, performance gains can be 
realized at the cell edge. 

(ii) Low training overheads: An uplink pilot from a 
user terminal trains all nearby BS antennas, whether 
these are in one or many locations. Thus, CSI ac¬ 
quisition between a user and nearby BSs need not 
incur additional overheads with respect to cellular 
transmission. 

In addition, the schemes of Defn. 1 possess several im¬ 
portant properties that are not in general present in CoMP 

schemes: 

(a) Local precoding at each BS: This is due to item (iii) 
in Defn. 1. For instance, in the case of linear zero¬ 
forcing beamforming (LZFBF), the beam for each user 
served by BS j is chosen within the null space of the 
channels of all the other users served by BS j, no 
matter whether there are additional BSs serving the 
user on the same RB or not. 

(b) No need for CSI exchanges among BSs: Again, due 
to item (iii) in Defn. 1, BS j only needs CSI between 
the users it serves and the antennas of BS j in order 
to generate the user beams at BS j. 

(c) Flexible scheduling: The schemes of Defn. 1 enable 
user-specific BS-cluster transmission, and allow serv¬ 
ing users from overlapping but different clusters of 
BSs on the same RB (see, e.g., RB #3 in Table I). 

(d) Simple predictors of instantaneous rates: As shown 
in [15], the instantaneous user rates can also be 
predicted a priori with CoMP. However, unlike general 
CoMP settings, where a user’s instantaneous rate de¬ 
pends on the other users co-scheduled for transmission 
on the same RB [15], the schemes of Defn. 1 make a 
user’s instantaneous rate independent of the identities 
of the other users in the scheduling set. 

As a result, the cellular-transmission attributes (A)-(C) 


exploited in [9] can be appropriately extended to allow 
resource allocation for the schemes of Defn. 1, in the form 
of network-optimized activity fractions between users and 
clusters of BSs. Although, as it turns out, item (D) is 
not always true with distributed MIMO, i.e., these activity 
fractions may not necessarily be realizable, as shown in 
Sec. VI, scheduling policies can be designed that may 
approximate these fractions in practice sufficiently well. 


IV. Peak Rates and Scheduled Throughputs 


In this section, we develop proxy expressions for the in¬ 
stantaneous user rates and for the scheduled user through¬ 
puts that are provided by any given scheduling policies en¬ 
abling distributed MIMO transmission with either LZFBF 
or maximum ratio transmission (MRT). 

We consider a scheduling policy on RBs {1, 2 • • • , T} 
and assume that all the large-scale coefficients stay fixed 
within this period. Any such scheduling policy can be de¬ 
scribed in terms of the scheduling sets {5c(f); VC, Vf € 
(1, 2 • • • , T}}, where Sc{t) denotes the set of active users 
served by cluster C on RB t. Thus, the received signal at 
an active user k G Sc{t) on RB t can be expressed by 


yk{t) =J2\ 7rSkj^kjSk + J2 


iec V ^ jeCuGScit) 

u^k 


' P 

' -Act" f 

&kj^ujOu 


desired 


intra-cluster interference 


■E E 

l^C (f) 



inter-cluster interference 

( 2 ) 

where C denotes the cluster (set) of BSs serving user k on 
RB t, C denotes the cluster including BS I, Su denotes 
the unit-power stream for user k, and denotes the unit- 
norm precoding vector for user u at BS j. 

Let Tkc denote the peak rate of user k from BS cluster 
C. It can be shown using the techniques in [15,16] that, 
with distributed MIMO based on LZFBF, r^c is given by^ 


TfcC = l0g2 


/ 

1 + 


EE ^PjPel3kjl3kebj{\C\)be{\C\)\ 

jec lec 


PtPki 


V 


) 


(3) 


where bj{L) = Similarly, for the case that 

the distributed MIMd transmission is based on MRT, 


TfcC=l0g2 1 + 


E, 


iec 
2 


Pj Pi Mj MilSkj 0ki '' 
5i(|C|)5,(|C|) 


Erec 

+ IkC + '^Idc PiPki 


(4) 


where Ikc = EjeC intra-cluster interfer¬ 

ence. For completeness, we provide the proof of (3) and 
(4) in Appendixes A and B, respectively. 


^Expression (3) assumes that \/j G C, BS j serves Sj{L) users, each 
user at power Pj /Sj (L). In the case that fewer users are served by one 
of the BSs, the LHS in (3) represents an achievable (lower bound) rate. 
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Similar to cellular massive MIMO in [9], the long-term 
user throughput with the admissible distributed MIMO 
schemes of Defn. 1 can be expressed in terms of the 
distributed MIMO peak rates and the activity fractions 
provided by the scheduling policy. In the limit T —> oo, 
the throughout of user k can be expressed as^ 

Rk = '^Xkcrkc, (5) 

c 

where Xkc = limr-j-oo is the activity 

fraction of user k with respect to cluster C. 

Fig. 1 shows an example of the potential benefits that 
distributed MIMO transmission can offer, by showing 
the peak rates of cellular vs pair-BS distributed MIMO 
transmission as a function of user location. When the user 
is close the transmitting BS cellular transmission is as 
good as anything. On the other bend when the user is close 
to the cell edge between BSs 1 and 2, distributed MIMO 
transmission from the cluster {1,2} yields about 3 times 
higher rates than cellular transmission. In the next section 
we formally consider the problem of allocating resources 
to users across BSs or clusters of BSs so as to optimize 
the network-wide system performance. 


V. User-Cluster Association AS NUM 

In this section, we formulate the user-cluster association 
problem as a NUM of activity fractions across all users, 
to optimize a network-wide utility function capturing the 
operator’s notion of (inherently subjective) fairness. 

Before formulating the NUM problem, it is worth 
restricting the domain of scheduling options in order to 
obtain solutions that are of practical interest. We focus on 
cluster sizes L G {1, 2, - , Umax} for some appropriately 

chosen maximum^ cluster size, Umax- Motivated by the 
example in Table I, we consider the following architecture. 

Definition 2. Uniform Cluster-Size Architecture (UCS).- 

A scheme from Defn. 1 is a UCS architecture, if for each 
L G {1, 2, , Umax}, a Xl > 0 fraction of the RBs is 

allocated to serving size-L clusters, and if on any RB from 
this Xl fraction the following are satisfied: 

(i) each scheduled user is served by a (user-dependent) 
cluster of L BSs; 

(ii) for each j G J, BS j serves no more than Sj{L) users. 

In the UCS architecture, users served by different-size 
clusters are scheduled on distinct RBs. For the example in 
Table I, such an architecture enables scheduling policies 
with RBs of types #1, #2, and #3, but not of type #4. 


^Convergence to the limiting expressions of interest is very quick [9]. 
“^The choice of Lmax is a design choice. It depends on the average 
number of nearby BS arrays that users typically see and the complexity 
that can be afforded. In our simulations, we set Lmax = 4. 


The NUM subject to the UCS architecture is 


max 


XkC 


kGU \C: |C|<L^,x 

s.t. ^ < AlS'j(U), Vj, U< -^max? 

C: j^C k&A 
\C\=L 

^ ^ '^kC ^ Xl, ^k G hi, L f. Umax, 

C: |C| = L 

XkC >0, Vfc G U, VC with |C| < Umax, 

E ^ 

L=1 

Xl h 0, VU < Umax- 


(6a) 

(6b) 

(6c) 

(6d) 

(6e) 

(6f) 


Ineq. (6b) signifies that the total activity fractions of users 
served by BS j in clusters of size U cannot exceed the 
product of available RBs and the maximum number of 
beams that can be spatially multiplexed at BS j in clusters 
of size U. Ineq. (6c) signifies that the fraction of RBs over 
which user k is served by clusters of size U cannot exceed 
the RB fraction allocated to size-U clusters. 

It is easy to verify that (6) is a convex optimization 
problem. Also note that, for Umax = 1, (6) specializes to 
the cellular massive MIMO NUM problem studied in [9]. 

The second architecture we consider also allows serving 
users of the type of RB # 4. 

Definition 3. Mixed Cluster-Size Architecture (MCS).- 

A scheme from Defn. 1 is a MCS architecture, if a Xl > 0 
fraction of the RBs is allocated VU G {2, 3, ..., Umax}, 
and if within any RB that is part of the Xl fraction the 
following are satisfied: (i) each scheduled user is served 
either in cellular mode, or by a (user-dependent) cluster 
of L BSs; (ii) Vj G hf, BS j serves either at most Sj users 
all in cellular mode, or at most Sj{L) users, all served in 
clusters of size L. 

A convex NUM problem analogous to (6) can be 
formulated for the MCS architecture. 


VI. Scheduling Policies eor NUM Solution 

In this section, we investigate scheduling policies that 
yield {xkc} closely matching the solution of (6). 

Definition 4. Feasible Schedule: A scheduling policy 
{Scity, VC, with \C\ < G (1, 2 - -- ,T}} is 

feasible with respect to the UCS architecture of Defn. 2 if 
it satisfies the following: 

(i) For each t, the policy associates with RB t a single 
cluster size, L(t), for some L(t) < Umax, he-, for 
each C for which Sc{t) is non-empty, \C\ = L(t). 

(ii) For each t, each user is served by at most one cluster; 
that is, I Ijfc G 5c(f)}| < 1 for all k GU. 

(Hi) For each t, and for each j G J, BS j serves at most 
Sj{L{t)) users; that is, \ Uc-.jec ‘5c(f)| < SjiL{t))- 

It is easy to verify that any feasible schedule yields 
activity fractions that satisfy (6b)-(6f). 
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(a) Illustration of user location. (b) Instantaneous rate vs. x-axis coordinate in (a). 

Fig. 1. Illustration of spectral efficiency versus user locations. The location in Fig. 1(a) indicates the x-axis coordinate of the path in Fig. 1(a). 
The cell edge users (e.g., the areas near the origin) benefit from distributed MIMO. 


For instance, in a network of 3 BSs, with Lmax = 2 
and Sj{2) = 3,Vj, no feasible schedule yields {xkc} 
with A 2 > 0, for which (6b) is satisfied with equality 
for all j and L = 2. This is because it is impossible 
to simultaneously schedule 3 users at all three BSs: at 
most two BSs can schedule 3 users, while the 3rd would 
necessarily schedule at most 2 users (i.e., the 3 BSs 
schedule a total of 4 users, each receiving beams from a 
BS pair). Clearly, any feasible schedule results in at least 
one strict inequality in (6b). Hence, the coarser time-scale 
NUM problem (6) does not capture the finer time-scale 
constraints associated with feasible schedulers. Although, 
in general, (6) provides an upper bound on the network 
performance, as we show next, using activity fractions that 
are the solution to (6), we can design scheduling policies, 
whose performance is close to the utility provided by the 
solution to (6). 


A. Virtual Queue Based Scheduling Scheme 

As in [6,9], we focus on the proportional fair utility 
(i.e., U{x) = log(a:) in (6a)) in the rest of this paper. 
We consider scheduling policies for the UCS architecture 
comprised of Lmax parallel schedulers, one per cluster 
size L G {1, 2, • • • ,Tmax}- We describe a method for 
scheduling users over the RBs from the > 0 fraction 
of RBs dedicated to clusters of size L. 

We first remark, that as in the cellular settings [6,9], 
empirical evidence reveals that, in a “loaded” network, 
most users are uniquely associated to a single cluster per 
cluster size, i.e., for most user indices k, there is a single 
nonzero Xkc among all C’s with the same \C\. 

Insight regarding this observation can be obtained by 
examining Karush-Kuhn-Tucker (KKT) conditions of (6), 
which imply 


E E XkCfkC > 

L' C:|C|=L' 


TkC 

+ dkL ’ 


(7) 


where VjL and gLkL are the Lagrange multipliers corre¬ 
sponding to (6b) and (6c), respectively. 


In a loaded network, where the constraints (6c) are 
inactive (i.e., X]c |C|=L^fcC < Al V/c G U), we have the 
following: 

Proposition 1. If (6c) are inactive Vfc € U, the number of 
users that are served by multiple clusters of size L is at 
most Nl — 1, where is the number of size-L clusters. 

Proof: See Appendix C. ■ 

Given the limited number of fractional users per cluster 
size L, the scheduler approximates the optimal {xkc} by 
unique association activity fractions, {xkc}, given by 

y. _j^kC ifC=C*{k) 

I 0 otherwise 

with C*{k) = argmax^. \c\=l ^kc- 

Letting Uc denote the users for which Xkc >0, we have 
Uc rUc =% for all C C with \C\ = \C'\. We also let 
= yJc-.\e\=Ll^C denote the set of users that receive 
non-zero activity from clusters of size L. 

To assign user k a fraction of RBs close to the desired 
fraction ak = Xkc/^L,'^^ consider a max-min scheduling 
policy based on virtual queues (VQ), which assumes user 
k receives rate Rk = l/afc when user k is scheduled for 
transmission over cluster C*{k) (i.e., k G Sc*(k)(t))- The 
cluster-size L scheduler performs at each t a weighted sum 
rate maximization (WSRM) of the form [17]: 

max y^Qk{t)Rk, (9a) 

ucuw 

kGU 

s.t. ^ l{j G C*(fc)} < 5,(L), VjGj, (9b) 
keu 

where the weight of user k at time t, Qk{f), is the VQ 
length of user k at time t. For max-min fairness [17], 
Qk{t) is updated as follows: 

Qk(t +1) = max{0,Qfc(f) - Rkit)} + Ak{t), (9c) 
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where 

Mt) = 

Akit) , , 

I 0 otherwise 

with Ainax and V chosen sufficiently large [17]. Note that 
in the absence of constraints (9b), the max-min scheduler 
(9) schedules user k the desired fraction of RBs, 

Scheduling via (9) is impractical, as it amounts to 
solving for each RB t an integer linear program of the form 
(9a)-(9b). A number of heuristic algorithms can be used to 
provide feasible (though generally suboptimal) solutions 
to (9). In this paper, we consider a rudimentary greedy 
algorithm. Letting Kl = be the total number 

of users to be served by clusters of size L, the greedy 
algorithm for size-L clusters at time t operates as follows: 

1. Determine a user order 7r{k), where (5,r(fe)(f)7?,r(fc) ^ 
Q^{k+i){t)R 7 v{k+i) for all fc e 

2. Initialization: k = 1, and U = %. 

3. If the user set U U {7r(fc)} satisfies all the constraints 
in (9b), set (7 = (7 U {7r(A:)}. 

4. If A: < Kl, set k = k + 1 and go to step 3. 

5. Output U as the scheduling user set at time t. 


if user k is scheduled at time t 
otherwise 


jA, 


if V>J:,Qkit) 


(9d) 


VII. Performance Evaluation 

In this section, we present a brief simulation-based 
evaluation of the proposed distributed MIMO schemes 
based on the “wrap-around” checkerboard layout in Fig. 2. 
There are 4 macros with Mj = 100 and SjiL) = lOL, and 
32 pico BSs with Mj = 40 and Sj{L) = 4L. One pico 
BS is at the center of each white square, while 3 pico BSs 
are dropped uniformly within each shaded square. Also, 15 
and 90 single-antenna users are dropped uniformly in each 
white and each shaded square, respectively. The macro 
and pico BS transmit powers are 46dBm and 35dBm, 
respectively. The path-loss for macro-user links and pico- 
user links are 128.1-1-37.6 log^g d and 140.7-1-36.7 log^g d, 
respectively, with the distance d in km. 

We consider two distinct macro-pico operation scenar¬ 
ios: (i) macros and picos operate on the same band, with 
cluster sizes up to Lmax = 4; (ii) macros are given 20% 
of the RBs for cellular transmission, and picos are given 
the remaining 80% for distributed MIMO with Lmax = 4. 

Figs. 3 and 4 compare the proposed distributed MIMO 
schemes^ against network-optimized cellular transmission 
[9] and max-SINR based association. Fig. 3 shows the 
user-rate geometric mean for each scheme and each oper¬ 
ation scenario considered. As the figure reveals, unique- 
association (i.e., the {xkc}) yield almost the same perfor¬ 
mance as the optimal solution, verifying our conclusion 
that the number of users served by multiple clusters per ar¬ 
chitecture is limited. Also, the proposed greedy VQ based 
scheduler performs within 90% of the NUM optimal value. 

^The NUM problem for scenario (ii) is a simple extension of (6). 


Fig. 2. 



A 2000m X 2000m network with 4 macro 32 pico BSs. 


1r 

0.9- 

0 . 8 - 

io.7- 

L 

» 

i 0.6- 

I 0.5- 

s 

» 

i 0.4- 
» 

i 0.3 
» 

0 . 2 - 
0.1 - 
0 - 


I Macros & picos share resources 
[Macros & picos use orth. resources 



NUM Unique Greedy VQ . Max-SINR 

solution association scheduling (^S^r? (cellular) 


Fig. 3. User-rate geometric means for various schemes under two 
distinct macro-pico operational scenarios. 


More importantly, it significantly outperforms network- 
optimized cellular operation under both scenarios. Note 
that though the orthogonal resource allocation with optimal 
user association in cellular case performs better than the 
shared resource allocation in our setting, which operation 
scenario is better highly depends on the system parameters 
(e.g., channel mode, transmit power and BS density). 

Fig. 4 shows the corresponding user-rate cumulative 
distribution functions. As the figure shows, the proposed 
distributed MIMO schemes yield about a 2x gain in 5th 
percentile rates with respect to the optimal cellular scheme 
[9], under both macro-pico operation scenarios. 

VIII. Conclusion 

We present techniques for harmonized use of cellular 
and CoMP transmission over massive MIMO HetNets. 
The techniques rely on using a class of distributed MIMO 
transmission schemes, which do not require CSI exchanges 
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Fig. 4. User-rate CDFs for various schemes. 


among BSs, and can enable flexible (user-specific) CoMP 
transmission. We use properties of the distributed MIMO 
schemes in the massive MIMO regime to formulate re¬ 
source allocation as a convex NUM problem, and present 
scheduling policies whose goal is to approximate the 
resulting optimized resource allocations. As our simu¬ 
lations show, the proposed operation offers significantly 
performance gains with respect to the network-optimized 
cellular-only massive MIMO operation [9], especially at 
the cell edge. More dynamic settings (e.g., users with 
high mobility) are left for future work. The investigation 
of other simulation settings (e.g., different pc) is also of 
interest. 
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Appendix A 

Proof of Spectral Efficiency Using ZF 
Precoding 


In this paper, we assume that each BS has perfect CSI. 
Techniques in [15,16] can be applied to derive the results. 
For completeness, we provide the proof as follows. We 


use E 


I+N 


E[S] 


to approximate SINR in the 


' E[/+Af] 

calculation of ergodic spectral efficiency in the massive 
MIMO regime, which is shown to be quite close to the 
exact asymptotic spectral efficiency [16]. 

Adopting the ZF precoding, the precoding matrix at 


BS j is F. = G, Gf G 


1/2 


where A, is the 



































normalizing coefficients matrix. In this case, the intra¬ 
cell interference is 0. Denoting the /cth diagonal element 
of A.j by akj and plugging the precoding matrix Fj = 
Gj^G^Gj) ^ into received signal, the SINK at 

user k from C is 


SINRfec 




Pj PlCLkj^kl 

jec V s,(|c|)Si(|c|) 






2 ’ 


( 10 ) 

where C' denotes the cluster including I that is different 
from C. Using similar techniques in the proof of Theorem 
III-l in [18], we can show 

as M, 


“fcj _, o (-^j~‘S 3 (|c|)-n) 

Sji\C\) ^ S,(|C|) 

j —r oo with hxed ratio Sj{\C\)/Mj < 1. 


Then 


we 


have 


ZjGC 

As for the interference, we have 


Pj PlQ^kj Q'kl 


jGC Y Sj{\C\)Si{\C\) 

y/PjPiPkjhibjbi, where h, = 


E E 

l^C 


=E E 

l^C ■.i^c')^c' 

l^C 


Pi 


Skl^ulS^ 


*5/(|C|) 

I Pi 

Si{\C\) 


Ski^^ 


ul 


where the last step follows from that channels and pre¬ 
coders of different users are independent. Based on the 


approximation E 


I+N 


lE[g] 

E[ 7 +tV] 


, we complete the proof 


by plugging the above results into ( 10 ). 


Appendix B 

Proof of Spectral Eeeiciency Using MRT 
Precoding 


We hrst give the following properties of MRT in the 
massive MIMO regime. 


1) We have \\skjV = SkjSkj = Pkj Efi\ 


Re¬ 
calling that hkj^i are i.i.d. Gaussian, we have —>■ 

Pkj^ihlj ihkj^i] = Pkj, as Mj and -S'j(|C|) become large 
with a hxed ratio Sj{\C\)/Mj < 1. 

2 


2) Plugging ffcj, we have 


'\J 0k j f^n 


■ h 


kj,i^nj,i 


to 


l3k j0r. 


.E 


which 


1 )® ^kj,1^3n,lbkj^2^nj,2 




Snj 


converges 

M,{M, - 


as M, 


oo. 


since hkj,i and hnj^i are i.i.d. Gaussian for n ^ k. 

Using the above two properties and similar techniques 



Fig. 5. The graph representation of the associations of three users. 


in Appendix A, we have 

SINRfcc 

(ZjgC \/sj(|C|) llgfcjll) __ 

1 + ZjGc(‘^f(l^l) ~ Sj{\C\)l^^3 P Yl,l^cPlPkl 

/ Pj Pi Mj Mi^kj 
2-^j^c 2^i^c Y SjcSic 

1 + ZjGc(*^i(l^l) ~ Sj(\C\)^k3 P PlPkl 

( 11 ) 

Plugging (11) into log 2 (l + SINR^), we complete the 
proof. 


Appendix C 

Proof of Proposition 1 


We use the techniques similar to the proof of Propo¬ 
sition 3 in [7], where a graph is used to represent the 
association, and KKT conditions (7) restrict the structure 
of the graph. 

We denote the graph by Gi, whose nodes represent 
the users, and the edge between two nodes represents the 
BS cluster that serves the two users in the considered 
architecture. Each node has an ID indicating the user 
index, while each edge has a color that identihes the BS 
cluster. For example. Fig. 5 shows that user k is served 
by both clusters C\ and C 2 , and user m is served by both 
clusters C\ and C 3 . 

In a heavily loaded network, where the constraints ( 6 c) 
are inactive (i.e., Zc |c|=L®fcC < -^l) in the optimal 
solutions, we have HkL = 0 for all k. If there are two 
users k and m being served by size-L clusters Ci and 
C 2 (i.C., XkCi ^ XjnCi ^ H’ XmC2 ^ H)’ 

we have Rk = 


rkC2 


^ 


-jeci '"ji- 


E 


^j£Ci 


j€C2 ‘ 


1/ T — y' u T Rm — 

^jL l^j^C2 

from KKT condition (7), where 


Pk = Zc ^kcrkC- Thus, we have 

^kCi _ ^mCi 

^kC2 ^mC 2 


( 12 ) 


which is true with probability 0. Therefore, it is almost 
sure that any two users can share at most one same cluster 
in each architecture. Similarly, we consider an example of 
three users k,m,i and clusters Ci,C 2 ,C 3 as illustrated in 
Fig. 5. We consider the following three cases: 
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1 ) If clusters Ci, C 2 and C 3 are different, we have 

TkCi _ ^jeCi Z^jgCa ^jL 

i"kC2 ^jeCa J2jeC2 (13) 

_ ^mCi ^iCs 

f-mCa ^iC 2 

which is true with probability 0 . 

2) If Cl = C 2 7 ^ C 3 , we have that users m and i are served 
both by clusters Ci and C 3 , which is true with probability 
0 from ( 12 ). 

3) If Cl = C 2 = C 3 , we have that users k, m and i are 
served by the same cluster, which is possible. In this case, 
the graph becomes a complete graph. 

Therefore, the graph Gi with three users either contains 
a loop with the same color edges or no loop. We can get 
a similar result for graph Gi with more than three users, 
where the users served by the same BS cluster constitute a 
complete graph. Thus, we generate a new graph, denoted 
by G 2 , where the node represents a cluster. There is an 
edge between two nodes in G 2 , if these two nodes (i.e., 
clusters) have a common vertex in Gi (i.e., there is at 
least one user served by both these two clusters). Thus 
the number of users who are served by more than one 
cluster is limited by the edge of G 2 . Note that there are 
Ni^ nodes and no loop in G 2 . Thus, G 2 is a tree, which 
has the maximal number of edges being one less than the 
number of nodes (i.e., Nl — 1). Therefore, the number of 
users served by multiple BS clusters equals the number of 
edges in graph G 2 , which is no more than Nl — 1 . 



